Temporal modulation processing of speech signals for noise robust ASR

نویسندگان

Hong You

Abeer Alwan

چکیده

In this paper, we analyze the temporal modulation characteristics of speech and noise from a speech/non-speech discrimination point of view. Although previous psychoacoustic studies [3][10] have shown that low temporal modulation components are important for speech intelligibility, there is no reported analysis on modulation components from the point of view of speech/noise discrimination. Our data-driven analysis of modulation components of speech and noise reveals that speech and noise is more accurately classified by low-passed modulation frequencies than band-passed ones. Effects of additive noise on the modulation characteristics of speech signals are also analyzed. Based on the analysis, we propose a frequency adaptive modulation processing algorithm for a noise robust ASR task. The algorithm is based on speech channel classification and modulation pattern denoising. Speech recognition experiments are performed to compare the proposed algorithm with other noise robust frontends, including RASTA and ETSI AFE. Recognition results show that the frequency adaptive modulation processing is promising.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Asr in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction

This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum ...

متن کامل

Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR

The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Robust energy demodulation based on continuous models with application to speech recognition

In this paper, we develop improved schemes for simultaneous speech interpolation and demodulation based on continuous-time models. This leads to robust algorithms to estimate the instantaneous amplitudes and frequencies of the speech resonances and extract novel acoustic features for ASR. The continous-time models retain the excellent time resolution of the ESAs based on discrete energy operato...

متن کامل

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Temporal modulation processing of speech signals for noise robust ASR

نویسندگان

چکیده

منابع مشابه

Robust Asr in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction

Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR

Improving the performance of MFCC for Persian robust speech recognition

Robust energy demodulation based on continuous models with application to speech recognition

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

عنوان ژورنال:

اشتراک گذاری